(19) 



J 



Eur pSischesPal ntamt 
European Patent Office 
Office europ6en des brevets 



(11) 



EP 0 751 494 A1 



(12) 



EUROPEAN PATENT APPLICATION 

pubHshed in accordance with Art. 158(3) EPC 



yio) Uatc or puDiicaiion. 


r«i 6. /^ifll QZ-iA /^iHI Q/i O 
(OiJ int. Q. . Ull UU 9/ 14, \JllUu^/1o 








(86) International application number: 


(21 ) Application number: 95940473.2 




(22) Date of filing: 19.12.1995 


(87) International publication number: 




WO 96/19798 (27.06.1996 Gazette 1996/29) 


(84) Designated Contracting States: 


(72) Inventor: NISHIGUCHl, Masayuki 


ATDEESFRGBITNL 


Sony Corporation 




Tokyo 141 (JP) 


(30) Priority: 21 .1 2.1 994 JP 318689/94 






(74) Representative: Ayers, Martyn Lewis Stanley 


(71) Applicant: SONY CORPORATION 


J.A. KEMP & CO. 


Tokyo 141 (JP) 


14 South Square 




Gray's Inn 




London WC1 R 5LX (GB) 



(54) SOUND ENCODING SYSTEM 

(57) Foe executing the code excitation linear predic- 
tion (CELP) coding, for example, a-parameters are 
taken out from the input speech signal by a linear pre- 
diction coding (IPC) analysis circuit 12. The a-parame- 
ters are then converted by an a-parameter to LSP 
converting circuit 13 into linear spectral pair (LSP) 
parameters and a vector of tiiese line spectral pair 
(LSP) parameters is vector-quantized by a quantizer 14. 
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The changeover switch 16 is controlled depending upon 
the pitch value detected by a pitch detection circuit 22 
for selecting and using one of the codebook 15M for 
male voice and the codebook 15F for female voice for 
improving quantization characteristics without increas- 
ing the transmission bit rate. 
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Description 

Technical Field 

This invention relates to a speech encoding method 
for encoding short-term prediction residuals or parame- 
ters representing short-term prediction coefficients of 
the input speech signal by vector or matrix quantization. 

Background Art 

There are a variety of encoding methods known for 
encoding the audio signal, inclusive of the speech sig- 
nal and the acoustic signal, by exploiting statistic prop- 
erties of the audio signal in the time domain and in the 
frequency domain and psychoacoustic characteristics 
of the human hearing system. These encoding methods 
may be roughly classified into encoding on the time 
domain, encoding on the frequency domain and analy- 
sis/ synthesis encoding. 

If. in multi-band excitation (MBE), single-band exci- 
tation (SBE). harmonic excitatioh, sub-band coding 
(SBC), linear predictive coding (LPC), discrete cosine 
transform (DCT). modified DCT (MDCT) or fast Fourier 
transform (FFT). as examples of high-eff idency coding 
for speech signals, various infornration data, such as 
spectral amplitudes or parameters thereof, such as LSP 
parameters, a-parameters or k-parameters. are quan- 
tized, scalar quantization has been usually adopted. 

If, with such scalar quantization, the bit rate is 
decreased to e.g. 3 to 4 kbps to further increase the 
quantization efficiency, the quantization noise or distor- 
tion is increased, thus raising difficulties in practical uti- 
lization. Thus it is currently practiced to group different 
data given for encoding, sudi as time-domain data, fre- 
quency-domain data or filter coefficient data, into a vec- 
tor, or to group such vectors across plural frames, into a 
matrix, and to effect vector or matrix quantization, In 
place of individually quantizing the different data. 

For example, in code excitation linear prediction 
(CELP) encoding, LPC residuals are directly quantized 
by vector or matrix quantization as time-domain wave- 
form. In addition, the spectral envelope in MBE encod- 
ing is similarly quantized by vector or matrix 
quantization. 

If the bit rate is decreased further, it becomes infea- 
sible to use enough bits to quantize parameters specify- 
ing the envelope of the spectrum itself or the LPC 
residuals, thus deteriorating the signal quality. 

In view of the foregoing, it is an object of the present 
invention to provide a speech encoding method capable 
of affording satisfactory quantization characteristics 
even with a smaller number of bits. 

Disclosure of the invention 

With the speech encoding method according to the 
present invention, a first codebook and a second code- 
book are formed by assorting parameters representing 



short-term prediction values concerning a reference 
parameter conprised of one or a combination of a plu- 
rality of characteristic parameters of the input speech 
signal. The short-term prediction values are generated 

5 based upon the input speech signal . One of the first and 
second codebooks concerning the reference parameter 
of the input speech signal is selected and the short-term 
prediction values are quantized by having reference to 
the selected codebook for encoding the input speech 

10 signal. 

The short-term prediction values are short-term 
prediction coefficients or short-term prediction errors. 
The characteristic parameters include the pitch values 
of the speech signal, pitch strength, frame power, 

15 voiced/unvoiced discrimination flag and the gradient of 
the signal spectrum. The quantization is the vector 
quantization or the matrix quantization. The reference 
parameter is the pitch value of the speech signal. One 
of the first and second codebooks is selected- in 

20 dependence upon the magnitude relation between the 
pitch value of the input speech signal and a pre-set pitch 
value. 

According to the present invention, the short-term 
prediction value, generated based upon the input 
25 speech signal, is quantized by having reference to the 
selected codebook for improving the quantization effi- 
ciency. 

Brief Description of the Drawings 

30 

Fig.1 is a schematic block diagram showing a 
speech encoding device (encoder) as an illustrative 
example of a device for candying out the speech encod- 
ing method according to the present invention. 
35 . Fig.2 is a circuit diagram for illustrating a smoother 
that may be erhployed for a pitch detection circuit shown 
in Fig.1. 

Rg.3 is a block diagram for illustrating the method 
for forming a codebook (training method) employed for 
40 vector quantization. 

Best Mode for Carrying out the Invention 

Preferred embodiments of the present invention will 
45 be hereinafter explained. 

Fig.1 is a schematic block diagram showing the 
constitution for canrying out the speech encoding 
method according to the present invention. 

In the present speech signal encoder, the speech 
so signals supplied to an input terminal 1 1 are supplied to 
a linear prediction coding (LPC) analysis circuit 12, a 
reverse-filtering circuit 21 and a perceptual weighting fil- 
ter calculating circuit 23. 

The LPC analysis circuit 12 applies a Hamming 
55 window to an input waveform signal, with a length of the 
order of 256 samples of the input waveform signal as a 
block, and calculates linear prediction coefficients or a- 
parameters by the auto-correlation method. The frame 
period^ as a data outputting unit, is comprised e.g., of 
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1 60 samples. If the sampling frequency fs is e.g., 8 kHz, 
the frame period is equal to 20 msec. 

The a-parameters from the LPC analysis circuit 12 
are supplied to an a to LSP converting circuit 1 3 for con- 
version to line spectral pair (LSP) parameters. That is. 5 
the a-parameters, found as direct-type f ater coefficients, 
are converted into e.g.. ten. that is five pairs of, LSP 
parameters. This conversion is carried out using e.g.. 
the Newton-Raphson method. The reason the a-param- 
eters are converted into the LSP parameters is that the io 
LSP parameters are superior to the a-parameters in 
interpolation characteristics. 

The LSP parameters from the a to LSP conversion 
circuit 13 are vector-quantized by an LSP vector quan- 
tizer 14. At this time, the inter-frame difference may be is 
first found before carrying out the veaor quantization. 
Alternatively, plural LSP parameters for plural frames 
are grouped together for carrying out the matrix quanti- 
zation. For this quantization, 20 msec corresponds to 
one frame, arid the LSP parameters calculated every 20 20 
msecs are quantized by vector quantization. For carry- 
ing out the vector quantization or matrix quantization, a 
codebook for male 1 5M or a codebook for female 1 5F Is 
used by switching between them with a changeover 
switch 16, in accordance with the pitch. 25 

A quantization output of the LSP vector quantizer 
14. that is the index of the LSP vector quantization, is 
provided, and the quantized LSP vectors are processed 
by a LSP to a conversion circuit 17 for conversion of the 
LSP parameters to the a-parameters as coefficients of 30 
the direct type filter. Based upon the output of the LSP 
to a conversion circuit 17, filter coefficients of a percep- 
tual weighting synthesis filter 31 for code excitation lin- 
ear prediction (CELP) encoding are calculated. 

An output of a so-called dynamic codebook (pitch 35 
codebook. also called an adaptive codebook) 32 for 
code excitation linear prediction (CELP) encoding is 
supplied to an adder 34 via a coefficient multiplier 33 
designed for multiplying a gain %. On the other hand, 
an output of a so-called stochastic codebook (noise 
codebook. also called a probabilistic codebook) is sup- 
plied to the adder 34 via a coefficient multiplier 36 
designed for multiplying a gain g-|. A sum output of the 
adder 34 is supplied as an excitation signal to the per- 
ceptual weighting synthesis filter 31 . 

In the dynamic codebook 32 are stored past excita- 
tion signals. These excitation signals are read out at a 
pitch period and multiplied by the gain go. The resulting 
product signal is summed by the adder 34 to a signal 
from the stochastic codebook 35 multiplied by the gain 
g^. The resulting sum signal is used for exciting the per- 
ceptual weighting synthesis filter 31. In addition, the 
sum output from the adder 34 is fed back to the dynamic 
codebook 32 to form a sort of an IIR filter. The stochas- 
tic codebook 35 is configured so that the changeover 
switch 35S switches between the codebook 35M for 
male voice and the codebook 35F for female voice to 
select one of the codebooks. The coefficient multipliers 
33, 36 have their respective gains go, gi controlled 



responsive to outputs of the gain codebook 37. An out- 
put of the perceptual weighting synthesis fitter 31 is sup- 
plied as a subtraction signal to an adder 38. An output 
signal of the adder 38 is supplied to a waveform distor- 
tion (Euclid distance) minimizing circuit 39. Based upon 
an output of the waveform distortion minimizing circuit 
39, signal readout from the respective codet)ooks 32. 
35 and 37 is controlled for minimizing an output of the 
adder 38. that is the weighted waveform distortion. 

In the reverse-filtering circuit 21. the input speech 
signal from the input terminal 11 is back-filtered by the 
a-parameter from the LPC analysis circuit 12 and sup- 
plied to a pitch detection circuit 22 tor pitch detection. 
The changeover switch 16 or the changeover switch 
35S is changed over responsive to the pitch detection 
results from the pitch detection circuit 22 for selective 
switching between the codelx)okfor male voice and the 
codebook for female voice. 

In the perceptual weighting filter calculating circuit 
23. perceptual weighting filter calculation is earned out 
on the input speech signal from the input terminal 1 1 
using an output of the LPC analysis circuit 12. The 
resulting perceptual weighted signal is supplied to an 
adder 24 which is also fed with an output of a zero input 
response circuit 25 as a sutrtraction signal. The. zero 
input response circuit 25 synthesizes the response of 
the previous frame by a weighted synthesis filter and 
outputs a synthesized signal. This synthesized signal is 
subtracted from the perceptual weighted signal for can- 
celing the filter response of the previous frame remnant 
in the perceptual weighting synthesis filter 31 for pro- 
ducing a signal required as a new input for a decoder. 
An output of the adder 24 is supplied to the adder 38 
where an output of the perceptual weighting synthesis 
filter 31 is subtracted from the addition output. 

In the above-described encoder, assuming that an 
input signal from the input terminal 11 is x(n). the LPC 
coefficients, i.e. a-parameters, are aj and the prediction 
residuals are res(n). With the number of orders for anal- 
ysis of P, 1 ^ i ^ P. The input signal x(n) is back-filtered 
by the reverse-filtering circuit 21 in accordance with the 
equation (1): 

p 

H(z)^U^a'Z-^ (1) 



for finding the prediction residuals(n) in a range e.g., of 
0 s n s N-1. where N denotes the number of samples 
corresponding to the frame length as an encoding unit. 
For example. N=160. 

Next, in the pitch detection circuit 22. the prediction 
residual res(n) obtained from the reverse-f iltering circuit 
21 is passed through a low-pass filter (LPF) for deriving 
resl(n). Such an LPF usually has a cut-off frequency fc 
of the order of 1 kHz in the case of the sampling clock 
frequency fs of 8 kHz. Next, the auto-correlation func- 
tion <l>resi(n) resl(n) is calculated in accordance with 
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the equation (2): 

= S resl{n)resl{n^i) (2) 
n»0 

Where L^m § i <Ln, ax- 
Usually, L^in is equal to 20 and Ir^ax ^^^^ to 1 47 
approximately. The pitch as found by tracking the 
number i which gives a peak value of the auto-correla- 
tion function <I>resi(0 Of the number i which gives a peak 
value by suitable processing is employed as the pitch lor 
the cun-ent frame. For example, assuming that the pitch, 
more specifically, the pitch lag, of the kth frame, is P(k). 
On the other hand, pitch reliability or pitch strength is 
defined by the equation (3): 

Pm^^rBstiPWf0^^{Q) (3) 

That is, the strength of the auto-cbrrelation, rx>rmal> 
ized by <S)resi{0). i& defined as above. 

In addition, with the usual code excitation linear 
prediction (CELP) coding, the frame power Ro(k) is cal- 
culated by the equation (4): 

F^o(f^)-}lE''^in) (4) 



where k denotes the frame number. 

Depending upon the values of the pitch lag P(k), 
pitch strength Pl(k) and the frame power Ro(k), the 
quantization table for {aj} or the quantization table 
formed by converting the a-parameters into line spectral 
pairs (LSPs) are changed over between the codebook 
for male voice and the codebook for female voice, (n the 
embodiment of Fig.1 . the quantization table for the vec- 
tor quantizer 14 used for quantizing the LSPs is 
changed over between the codebook for male voice 
15M and the codebook for female voice 15F. 

For example, if Pth denotes the threshold value of 
the pitch lag P(k) used for making distinction between 
the male voice and the female voice, and and 
denote respective threshold values of the pitch strength 
Pl(k) for discriminating pitch reliability and the frame 
power Ro(k), 

(i) a first codebook, e.g., the codebook for male 
voice 15M, is used for P(k) s P^. Pl(k) > and 
Ro(k) > Roihi 

(ii) a second codebook. e.g., the codebook for 
female voice 15F. is used for P(k) s P^^. Pl(k) > 
and Ro(k) > Roth: and 

(ill) a third codebook is used othenArise. 

Although a codebook different from the codebook 
35M for male voice and the codebook 35F for female 
voice may be employed as the third codebook, it is also 



possible to employ the codebook 35M for male voice or 
the codebook 35F for female voice as the third code- 
book. 

The above threshold values may be exemplified 
5 e.g., by P^, = 45, Pith = 0.7 and Ro(k) = (full scale - 40 
dB). 

Alternatively, the codet>ooks may be changed over 
by preserving past n frames of the pitch lags P(k), find- 
ing a mean value of P(k) over these o frames and dis- 
10 criminating the mean value with the pre-set threshold 
value Pth- It is noted that these n frames are selected so 
that Pl(k) > Pith- and Ro(k) > Roth*, that is so that the 
frames are voiced frames and exhibit high pitch reHabil- 
ity. 

IS Still alternatively, the pitch lag P(k) satisfying the 
above condition may be supplied to the smoother 
shown in Fig.2 and the resulting smoothed output may 
be discriminated by the threshold value Pth for changing 
over the codet>ooks. It is. noted that an output of the 

20 smoother of Fig.2 is obtained by multiplying the input 
data with 0.2 by a multiplier 41 and summing the result- 
ing product signal by an adder 44 to an output data 
delayed by one frame by a delay circuit 42 and multi- 
plied with 0.8 by a multiplier 43. The output state of the 

25 smoother is maintained unless the pitch lag P(k), the 
input data, is supplied. 

In combination with the above-described switching, 
the codet>ooks may also be changed over depending 
upon the voiced/unvoiced discrimination, the value of 

30 the pitch strength P!(k) or the value of the frame power 
Ro(k). 

In this manner, the mean value of the pitch is 
extracted from the stable pitch section and discrimina- 
tion is made as to whether or not the input speech is the 

35 male speech or the female speech for switching 
between the codebook for male voice and the codebook 
for female voice. The reason is that, since there is devi- 
ation in the frequency distribution of the fbrmarit of the 
vowel between the male voice and the female voice, the 

40 space occupied by the vectors to be quantized is 
decreased, that is. the vector variance is diminished, by 
switching between the male voice and the female voice 
especially in the vowel portion, thus enabling satisfac- 
tory training, that is learning to reduce the quantization 

45 error. 

It is also possible to ctBnge over the stochastic 
codebook in CELP coding in accordance with the above 
conditions. In the embodiment of Fig.1, the changeover 
switch 35S is changed over in accordance with the 
50 above conditions for selecting one of the codebook 35M 
for male voice and the codebook 35F for female voice 
as the stochastic codebook 35. 

For codebook learning, training data may be 
assorted under the same standard as that for encod- 
55 ing/decdding so that the training data will be optimized 
under e.g.. the so-called LBG method. 

That is. referring to Fig.3, signals from a training set 
51. made up of speech signals for training, continuing 
for e.g., several minutes, are supplied to a line spectral 
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pair (LSP) calculating circuit 52 and a pitch discriminat- 
ing circuit 53. The LRP calculating circuit 52 is equiva- 
lent to e.g.. the LPC analysis circuit 12 and the a to LSP 
converting circuit 1 3 of Fig.1 . while the pitch discriminat- 
ing circuit 53 is equivalent to the back filtering circuit 21 
and the pitch detection circuit 22 of Fig.1 . The pitch dis- 
crimination circuit 53 discriminates the pitch lag P(k), 
pitch strength Pl(k) and the frame power Ro(k) by the 
above-mentioned threshold values Pth. P^th and Roth for 
case classification in accordance with the above condi- 
tions (i). (iO and (iii). Specifically, discrimination between 
at least the male voice under the condition (i) and the 
female voice under the condition (ii) suffices. Alterna- 
tively the pitch lag values P{k) of past n voiced frames 
with high pitch reliability may be preserved and a mean 
value of the P(l^ values of these n frames may be found 
and disaiminated by the threshold value Pth- An output 
of the smoother of Fig.2 may also be discriminated by 
the threshold value Pth- 

The LSP data from the LSP calculating circuit 52 
are sent to a training data assorting circuit 54 where the 
LSP data are assorted into training data for male voice 
55 and into training data for female voice 56 in depend- 
ence upon the discrimination output of the pitch discrim- 
ination circuit 53. These training data are supplied to 
training processors 57, 58 where training is canied out 
In accordance with e.g., the so-called LBG method for 
formulating the codebook 35M for male voice and the 
codebook 35F fpr female voice. The LBG method is a 
method for codebook training proposed in Linde, Y, 
Buzo. A. and Gray. R.f^.. "An Algorithm for vector Quan- 
tizer Design", in IEEE Trans. Comm., GOM-28. pp. 84 to 
95, Jan. 1980. Specifically, it is a technique of designing 
a locally optimum vector quantizer for an information 
source, whose probabilistic density function has not 
been known, with the aid of a so-called training string. 

The codebook 151^ for male voice and the code- 
book 1 5F for female voice, thus formulated, are selected 
by switching the changeover switch 16 at the time of 
vector quantization by the vector quantizer 14 shown in 
Fig.1. This changeover switch 16 is controlled for 
switching in dependence upon the results of discrimina- 
tion by the pitch detection circuit 22. 

The index information, as the quantization output of 
. the vector quantizer 14. that is the codes of the repre- 
sentative vectors, are outputted as data to be transmit- 
ted, while the quantized LSP data of the output vector is 
converted by the LSP to a converting circuit 17 into a- 
parameters which are fed to a perceptual weighing syn- 
thesis filter 31. This perceptual weighing synthesis filter 
31 has characteristics 1/A(z) as shown by the following 
equation (5): 



A(zy 



/■=i 



^W{z) 



(5) 



where W(z) denotes perceptual weighting characteris- 



tics. 

Among data to be transmitted in the above- 
; ^^described GELP encoding, there are the index informa- 
tion for the dynamic codebook 32 and the stochastic 
5 codebook 35. the index information of the gain code- 
book 37 and the pitch information of the pitch detection 
circuit 22, in addition to the index information of the rep- 
resentative vectors in the vector quantizer 14. Since the 
pitch values or the index of the dynamic codebook are 
10 parameters inherently required to be transmitted, the 
quantity of the transmitted information or the transmis- 
sion rate is not increased. However, if the parameters 
not to be inherently transmitted, such as the pitch infor- 
mation, is to be used as reference basis for switching 
IS between the codebook for male voice and that for 
female voice, it is necessary to transmit separate code 
switching information. 

It is noted that discrimination between the male 
voice and the female voice need not be coincident with 
20 the sex of the speaker provided that the codebook 
selection has been made under the same standard as 
that for assortment of tiie training data. Thus the appel- 
lation of the codebook for male voice and the codebook 
for female voice is merely the appellation for conven- 
25 ience. In the present embodiment, the codebooks are 
changed over depending upon the pitch value by 
exploiting the fact that conelation exists between the 
pitch value and the shape of the spectral envelope. 
The present invention is not limited to the above 
30 embodiments. Although each conponent of the 
arrangement of Fig.1 is stated as hardware, it may also 
be implemented by a software program using a so- 
called digital signal processor (DSP). The low-range 
side codebook of band-splitting vector quantization or 
35 the partial codebook such as a codebook for a part of 
the multistage vector quantization may be switched 
between plural codebooks for male voice and fpr female 
voice. In addition, matrix quantization may also be exe- 
cuted in place of vector quantization by grouping data of 
40 plural frames togetiier. In addition, the speech encoding 
method according to the present invention is not limited 
to the linear prediction coding method employing code 
excitation but may also be applied to a variety of speech 
encoding methods in which the voiced portion is synthe- 
45 sized by sine wave synthesis and the nonrvoiced por- 
tion is synthesized based upon the noise signal. As for 
the usage, the present invention is not limited to trans- 
mission or recording/reproduction t)ut may be applied to 
a variety of usages, such as pitch conversion speech 
50 modification, regular speech syntheses or noise sup- 
pression: 

Industrial Applicability 

55 As will be apparent from the foregoing description, 
a speech encoding method according to tiie present 
invention provides a first codebook and a second code- 
book formed by assorting parameters representing 
short-terrri prediction values concerning a reference 
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parameter comprised of one or a combination of a plu- 
rality of characteristic parameters of the input speech 
signal. The short-term prediction values are then gener- 
ated based upon an input speech signal and one of the ? 
first and second codebooks is selected in connection 5 
with the reference parameter of the input speech signal. 
The short-term prediction values are encoded by having 
reference to the selected codebook for encoding the 
input speech signal. This Improves the quantization effi- 
ciency. For example, the signal quality may be improved 10 
without increasing the transmission bit rate or the trans- 
mission bit rate may be lowered further while suppress- 
ing deterioration in the signal quality. 

Claims 16 

1. A speech encoding method comprising: 

generating values relating short-term predic- 
tion based upon an input speech signal; 20 
providing a first codebook and a second code- 
book formed by assorting parameters repre- 
senting said values relating short-term 
prediction in relation to a reference parameter 
and producing data based upon the assorted zs 
parameters, said reference parameter com- 
prised of one or a combination of a plurality of 
characteristic parameters of the input speech 
signal; 

selecting one of the first and second code* 30 
books in relation to said reference parameter of 
said input speech signal; and 
quantizing said values relating short-term pre- 
diction by referring to the selected codebookfor 
encoding said input speech signal. 35 

2. The speech encoding method as claimed in claim 1 
wherein said values relating short-term prediction 
are short-term prediction coefficients. 

40 

3. The speech encoding method as claimed in claim 1 
wherein said values relating short-term prediction 
are short-term prediction errors. 

4. The speech encoding method as claimed in claim 1 45 
wherein said characteristic parameters are the 
pitch value of a speech signal, pitch strength, frame 
power, a voiced/unvoiced discrimination flag and 
the gradient of the signal spectrum. 

so 

5. The speech encoding method as claimed in claim 1 
wherein said values relating short-term prediction 
are vector-quantized for encoding the input speech 
signal. 

55 

. 6. The speech encoding method as claimed in claim 1 
wherein said values relating short-term prediction 
are matrix<|uantized for encoding the input speech 
signal. 



7. The speech encoding method as claimed in claim 1 
wherein said reference parameter is the pitch value 
of the speech signal and wherein one of the first 
codebook and the second codebook is selected 
depending upon the magnitude relation of the pitch 
value of the input speech signal and a pre-set pitch 
value. 

Amended claims under Art. 19.1 PCI 

1 . A speech encoding device comprising: 

short-term prediction means 16r generating 
short-term prediction coeffidents based on 
input speech signals; 

a plurality of codet>ooks formed by assorting 
parameters specifying the short-term predic- 
tion coefficients with respect to reference 
parameters, said reference parameters being 
the combination of one or more of a plurality of 
characteristic parameters of speech signals; 
selection means Ibr selecting one of said code- 
books in relation to said reference parameters 
of said input spe^h signals; and 
quantization means for quantizing said short- 
term prediction coefficients by referring to the 
codebook selected by said selection means; 
wherein the improvement resides in that 
an excitation signal is optimized using a quan- 
tized value from said quantization means. 

2. The speech encoding device as claimed in claim 
1 wherein said characteristic parameters include a 
pitch value of speech signals, pitch strength, frame 
power, a voice/unvoiced discrimination flag and the 
gradient of the signal spectrum. 

3. The speech encoding device as claimed in claim 
1 wherein said quantization means vector-quan- 
tizes said short-term prediction coefficients. 

4. The speech encoding device as claimed in claim 
1 wherein said quantization means matrix-quan- 
tizes sakl short-term prediction coefficients. 

5. The speech encoding device as claimed in claim 
1 wherein said reference parameter is a pitch value 
of speech signals, said selection means selects 
one of said codebooks responsive to the relative 
magnitude of the pitch value of said input speech 
signals and said pre-set pitch value. 

6. The speech encoding device as claimed in claim 
1 wherein said codebooks include a codebookfor a 
male voice and a codebook for a female voice. 

7. A speech encoding method comprising: 

generating short-term prediction coefficients 
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based on input speech signals; 

providing a plurality of codebooks formed by 
assorting parameters specifying the short-term 
prediction coefficients with respect to reference 
parameters, said reference parameters being s 
the combination of one or more of characteris- 
tic parameters of speech signals; 
selecting one of said codebooks in relation to 
said reference parameters of said input speech 
signals; io 
quantizing said short-term prediction coeffi- 
cients 1^ referring to the selected codebooK; 
and 

optimizing an excitation signal using a quarv 
tized value of said short-term prediction coeffi- is 
cients. 

8. The speech encoding method as claimed in 
claim 7 wherein said characteristic parameters 
include a pitch value of speech signals, pitch 20 
strength, frame power, a voice/unvoiced discrimina- 
tion flag and the gradient of the signal spectrum. 

9. The speech encoding method as claimed in 
claim 7 wherein said short-term prediction coeffi- 25 
cients are vector-quantized for encoding the input 
speech signals. 

10. The speech encoding method as claimed in 
claim 7 wherein said short-term prediction coeffi- so 
cients are matrix-quantized for encoding the input 
speech signals. 

11. The speech encoding method as claimed in 
claim 7 wherein said reference parameter is a pitch 35 
value of speech signals and wherein one of said 
cod^ooks iS: selected responsive to the relative 
magnitude of the pitch value of said input speech 
signals and said pre-set pitch value. 

40 

12. The speech encoding method as claimed in 
claim 7 wherein said codebooks include a code- 
book for a male voice and a codetK)Ok for a female 
voice. 

45 

13. A speech encoding device comprising: 

short-term prediction means for generating 
short-term prediction coefficients based on 
input speech signals; so 
a first plurality of codebooks formed by assort- 
ing parameters specifying the short-term pre- 
diction coefficients with respect to reference 
parameters, said reference parameters being 
the combination of one or more of characteris- ss 
tic parameters of speech signals; 
selection means for selecting one of said code- 
books in relation to said reference parameters 
of said input speech signals; and 



- quantization means for quantizing said short- 
term prediction coefficients by referring to the 
codebook selected by said selfscSpn means: 
a second plurality of codebooks formed on the 
basis of training data assorted with respect to 
reference parameters, said reference parame- 
ters being the combination of one or more of 
characteristic parameters of speech signals, 
one of said second plurality of codebooks 
being selected as the codebook of the first plu- 
rality of codebooks is selected i>y said selection 
means; and 

synthesis means for synthesizing, on the basis 
of the quantized value from said quantization 
means, an excitation signal related to output- 
ting of the selected codebook of said second 
plurality of codebooks: 

said excitation signal being optimized respon- 
sive to an output of said synthesis means. 

14. The speech encoding device as claimed In 
claim 1 wherein said characteristic parameters 
include a pitch value of speech signals, pitch 
strength, frame power, a voice/unvoiced discrimina- 
tion flag and the gradient of the signal spectrum. 

15. The speech encoding device as claimed in 
claim 13 wherein said quantization means vector- 
quantizes said short-term prediction coefficients. 

16. The speech encoding device as claimed in 
claim 13 wherein said quantization means matrix- 
quantizes said short-term prediction coefficients. 

17. The speech encoding device as claimed in 
claim 13 wherein said reference parameter is a 
pitch value of speech signals and wherein said 
selection means selects one of said first plurality of 
codebooks responsive to the relative magnitude of 
the pitch value of said input speech signals and 
said pre-set pitch value. 

18. The speech encoding device as claimed in 
claim 1 3 wherein each of said first plurality of code- 
books and said second plurality of codet)ooks 
indudes a codebook for a male voice and a code- 
book for a f ennale voice. 

19. A speech encoding method comprising: 

generating short-term prediction coefficients 
based on input speech signals; 
providing a first plurality of codebooks formed 
by assorting parameters specifying the short- 
term prediction coefficients with respect to ref- 
erence parameters, said reference parameters 
being the combination of one or more of char- 
acteristic parameters of speech signals; 
selecting one of said first plurality of codebooks 
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in relation to said reference parameters of said 
input speech signals; 

quantizing said short-term prediction coeffi^ ^ 
cients by referring to the selected codebook: 
providing a second plurality of codebooks s 
formed on the basis of training data assorted 
with respect to reference parameters, said ref- 
erence parameters being the combination of 
one or more of characteristic parameters of 
speech signals, one of said second plurality of 10 
codebooks being selected with selection of the 
codebook of the first plurality of codebooks; 
and syrrthesizing. on the basts of the quantized 
value of said short-term prediction coefficients, 
an excitation signal related to outputting of the is 
selected codebook of said second plurality of 
codebooks for optimizing said excitation signal. 



20. The speech encoding method as claimed in 
claim 19 wherein said characteristic parameters 20 
include a pitch value of speech signals, pitch 
strength, frame power, a voice/unvoiced disaimina- 
tion flag and the gradient of the signal spectrum. 

21. The speech encoding method as claimed in 2S 
claim 19 wherein said short-term prediction coeffi- 
cients are vector-quantized for encoding the input 
speech signals. 

22. The speech encoding method as claimed in 30 
claim 19 wherein said short-term prediction coeffi- 
cients are matrix-quantized for encoding the input 
speech signals. 

23. The speech encoding method as claimed in 35 
claim 19 wherein said reference parameter is a 
pitch value of speech signals and wherein one of 
said first plurality of codebooks is selected respon- 
sive to the relative magnitude of the pitch value of 
said input speech signals and said pre-set pitch 40 
value. 



24. The speech encoding method as claimed in 
claim 19 wherein each of said first plurality of code- 
books and said second plurality of codebooks 45 
includes a codebook for a rnale voice and a code- 
book for a female voice. 
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